Impact of the Ananya program on reproductive, maternal, newborn and child health and nutrition in Bihar, India: early results from a quasi-experimental study

Background The Government of Bihar (GoB) in India, the Bill and Melinda Gates Foundation and several non-governmental organisations launched the Ananya program aimed to support the GoB to improve reproductive, maternal, newborn and child health and nutrition (RMNCHN) statewide. Here we summarise changes in indicators attained during the initial two-year pilot phase (2012-2013) of implementation in eight focus districts of approximately 28 million population, aimed to inform subsequent scale-up. Methods The quasi-experimental impact evaluation included statewide household surveys at two time points during the pilot phase: January-April 2012 (“baseline”) including an initial cohort of beneficiaries and January-April 2014 (“midline”) with a new cohort. The two arms were: 1) eight intervention districts, and 2) a comparison arm comprised of the remaining 30 districts in Bihar where Ananya interventions were not implemented. We analysed changes in indicators across the RMNCHN continuum of care from baseline to midline in intervention and comparison districts using a difference-in-difference analysis. Results Indicators in the two arms were similar at baseline. Overall, 40% of indicators (20 of 51) changed significantly from baseline to midline in the comparison districts unrelated to Ananya; two-thirds (n = 13) of secular indicator changes were in a direction expected to promote health. Statistically significant impact attributable to the Ananya program was found for 10% (five of 51) of RMNCHN indicators. Positive impacts were most prominent for mother’s behaviours in contraceptive utilisation. Conclusions The Ananya program had limited impact in improving health-related outcomes during the first two-year period covered by this evaluation. The program’s theories of change and action were not powered to observe statistically significant differences in RMNCHN indicators within two years, but rather aimed to help inform program improvements and scale-up. Evaluation of large-scale programs such as Ananya using theory-informed, equity-sensitive (including gender), mixed-methods approaches can help elucidate causality and better explain pathways through which supply- and demand-side interventions contribute to changes in behaviour among the actors involved in the production of population-level health outcomes. Evidence from Bihar indicates that deep structural constraints in health system organisation and delivery of interventions pose substantial limitations on behaviour change among health care providers and beneficiaries. Study registration ClinicalTrials.gov number NCT02726230.

VIEWPOINTS RESEARCH THEME 6: LEARNING FROM ANANYA PROGRAM IN BIHAR mitigate supply and demand-side constraints with the purpose of increasing availability, utilisation, quality and equity of high-impact RMNCHN interventions focused at home, community and first-level clinics.
Implementation of Ananya was supported by multiple grants, three of which are the focus of this paper: IFHI led by CARE India [13,37] and the SDP grant to BBC Media Action [38] for ancillary support to GoB implementation, and a grant to Mathematica to evaluate the impacts of the program. Table 1 briefly outlines key IFHI and SDP interventions that were initially piloted in eight focus districts during 2012 and 2013, with a plan to scale up effective interventions to the other 30 comparison districts. A more comprehensive description of the IFHI and SDP interventions is provided elsewhere [13,37,38]. IFHI implemented a package of interventions at the community/outreach level and in public health facilities with the intent to: 1) improve health outcomes in the eight focus districts by increasing the number and contact time of frontline worker (FLW) [including Accredited Social Health Activist (ASHAs) and Anganwadi Worker (AWW)] home visits; 2) increase the timeliness and quality of FLW interactions with households and communities; 3) develop new tracking systems to reach marginalised groups and other beneficiaries that had historically not been visited by FLWs; 4) strengthen FLW technical capacity, knowledge, and interpersonal communication skills to confidently provide accurate and stage-specific health information; and 5) build accountability and performance management systems through strengthened supervisory structures. Although not a focus of the pilot phase, IFHI also embarked on the development of comprehensive quality improvement initiatives in public facilities to improve clinical care through structured assessments, identifying gaps in coverage, and developing action plans for systematic improvement [13,39,40]. BBC Media Action piloted a range of mass-media and social and behaviour change communication interventions designed to increase demand for and adoption of priority health behaviours at the community level [20,38]. Interventions varied in the timing and location of introduction and testing, and included mHealth-based job aids for FLWs to improve the delivery of quality health messages, as well as multimedia messaging through television, radio and street theater to encourage the adoption of healthy behaviours.
Ananya was originally designed as a pilot program to inform the subsequent GoB-led statewide scale-up (from eight to all 38 districts in Bihar) of prioritised RMNCHN interventions and platforms. After two years of implementation, the GoB prompted an accelerated scale-up, and in late 2013, CARE India formed the Bihar Technical Support Program (BTSP) to partner with the GoB to strengthen the Bihar public health system. A description of the BTSP -including goals, interventions, grantees, and data sources -m is provided elsewhere [13]. Thus, the GoB-BMGF-NGO partnership consisted of two phases: (1) two full years of intensive program implementation support to the government (2012-2013) in eight focus districts, termed Ananya -the pilot phase -followed by (2) transition to less intensive techno-managerial support to the GoB for statewide scale-up across 38 districts and 104 million population (2014 to present), complemented by other initiatives as described previously [13].

Evaluation design
Mathematica implemented a quasi-experimental impact evaluation of Ananya that included statewide household surveys at two time points: January through April 2012 ("baseline") and January through April 2014 ("midline") [41]. An original plan for an "endline" survey approximately five years into the program became unfeasible when the GoB began to scale-up select interventions statewide in all districts in 2014. The two arms of the evaluation included the intervention arm comprised of the eight focus districts, and a comparison arm that included the remaining 30 districts in Bihar where Ananya was not being implemented during the pilot phase. The eight focus districts in the pilot phase were located in a cluster in the northwest region of the state (East Champaran, West Champaran, and Gopalganj) and in another that was relatively accessible near the capital city of Patna (Patna, Samastipur, Begusarai, Saharsa, and Khagaria) [13].

Sampling design and survey procedures
Mathematica surveyors collected data from households using a three-stage sampling design applied at baseline, and returned to the same villages to collect surveys at midline, although from a different cohort of women. In the first stage, a representative sample of blocks (the primary sampling unit, or PSU) was randomly selected in each district with larger districts including proportionally more PSUs. Stratification sampling by urban/rural area was performed to enrich the urban population in the sample. In the second stage, a representative set of secondary sampling units (SSUs) in the sampled PSUs was identified, with proportionally more SSUs identified in larger PSUs. SSUs were also defined as villages in rural areas and blocks in urban areas. Small SSUs (those with fewer than 75 households) were combined with nearby SSUs into a single SSU before sampling. In the third stage, large rural SSUs (those with 150 households or more) were first divided into several equal-sized segments of 75 to 150 households per segment. A single segment was then randomly selected into the sample; urban SSUs were rarely much larger than 100 households, and thus, this step was necessary only for rural SSUs.
Surveys were administered to maternal household respondents who had given birth in the catchment areas in the previous year. Surveys were conducted by an independent contractor (Sambodhi) in collaboration with Mathematica. Surveys focused on children ages 0-11 months because interventions were targeted most intensively on delivery and postnatal infant outcomes in the first year after delivery. Mathematica did not conduct longitudinal follow-up of the same cohort of women at baseline and midline because outcomes were focused on behaviours and practices at a particular stage of life, and women from the baseline cohort may or may not have had another child in the year prior to the midline survey; rather cross-sectional surveys were administered and thus the maternal respondents at baseline were not the same cohort as for the midline but were sampled from the same villages and segments. Mathematica considered various options and selected all non-focus districts (n = 30), where the Ananya program had not been implemented, as the primary comparison group. Comparing focus and comparison districts across both the 2012 and 2014 surveys enabled estimation of difference-in-difference as a reflection of the contribution of the Ananya program to changes in indicators over the survey period, assuming that trends in treatment and comparison would have been the same in the absence of the program, as suggested by prior Mathematica analysis [41]. This, then, also enabled assessment of changes from baseline to midline in the absence of Ananya interventions. To account for the most appropriate survey design scheme, the analysis specified the district as the first level of sampling, block as the second level (with urban/rural categorisation as stratum and appropriate finite population corrections within each stratum), and villages or urban blocks as the third level. Household-level sampling weights were also applied, which accounted for all the stages of sampling.

RMNCHN indicator selection and categorisation
RMNCHN indicators (n = 51) that directly reflected the multi-faceted Ananya program were pre-specified for analysis based on review by three independent members of the Stanford analytic team with expertise in maternal and child health and the conduct of field research ( ultimately not used because: 1) they were redundant (more than one variable was assessed for the same indicator), 2) they were judged unfit due to lack of specificity of the question, 3) the item was judged unfit due to poor quality of data obtained, and 4) harmonisation was not possible due to changes in the question stem across surveys.
Indicators were first grouped into the following domains according to the continuum of care: antenatal care (ANC), birth preparedness, delivery (childbirth care), postnatal care, child nutrition/complementary feeding, child immunisation, and family planning ( Table S2 in the Online Supplementary Document).
Within each of these domains, we further classified the indicators into three delivery platforms or approaches: FLW performance or behaviour, mother's behaviour, and facility care and outreach service delivery (recognising the limited emphasis on facility-based care in the pilot phase). Our aim was to characterise program impact based on continuum of care domains and delivery platforms by examining trends for subgroups of indicators. Indicators of FLW performance were based on actions carried out by FLWs, for example giving advice on various aspects of pregnancy and newborn care and conducting postnatal visits. Indicators of mother's behaviour were heavily dependent on her decision to adopt that behaviour, with a less tangible role for the FLW or a response from the health system. Indicators of facility care and outreach service delivery were those that reflected the quality of supply chains and availability and quality of facility-based care such as ANC, hygienic practice of birth attendants in facility-based deliveries, provision of iron-folic acid (IFA) tablets, family planning procedures and immunisations.

Statistical analysis
We examined the demographic characteristics of maternal respondents by their treatment allocation (focus/intervention vs non-focus/comparison district) and survey time (baseline vs midline). We reported crude percentages without adjusting for survey design or weights.
For each of the RMNCHN indicators, two multivariate regression models that accounted for the survey design were first constructed to compare the difference between the intervention and comparison by baseline and midline, respectively. To determine the effect attributable to the Ananya program, we conducted difference-in-difference (DID) analysis to model the intervention effect, accounting for the survey design [42]. The independent variables were the binary intervention group (focus vs comparison districts), study period (baseline vs midline), and an interaction term of these two factors. The DID estimator from the model is the interaction term that captured the change in reported RMNCHN indicators among women respondents attributed to the Ananya program. All three models (baseline, midline and DID model) were adjusted for potential confounding variables, including maternal age, maternal respondents' religion (Hindu vs non-Hindu), whether a woman belonged to a Scheduled Tribe or Scheduled Caste (STSC), number of children, household size, literacy, and socioeconomic status (SES) quartile. SES quartile was determined using methods based on the National Family Health Survey (NFHS)-3 [43]. Principal components analysis on the baseline data was used to compute a wealth index for each household based on characteristics likely to reflect poverty, such as the number of household members per room, the material from which the residence was constructed, and ownership of various durable goods. Coefficients from the baseline principal components analysis were used to estimate the wealth index for each woman at midline. The first principal component explained 16.5% of the variability (data not shown). Quartiles are relative to the 2012 statewide SES distribution for women who gave birth in the previous 12 months. We used survey Poisson regressions for count-type indicators while survey logistic regressions were used for binary indicators. We further evaluated and reported the percentage point difference of the DID estimators by estimating the marginal effect of the interaction term from the logistic regression models Forest plots presenting DID estimators and their 95% confidence intervals (CIs) were used to summarise the impact of the Ananya program by key indicator domains (ie, across the continuum of care and delivery platforms). We also conducted exploratory analyses using the same analytic approach to test the hypothesis that sub-groups of women who delivered in facilities compared to homes, or had received either two or more third-trimester antenatal FLW home visits or one or more early postnatal FLW home visits would show differences in selected indicators compared to women who did not receive these aspects of care. For these exploratory analysis, P values for the regression models were reported and FDR adjustment was not done since the analysis was post-hoc. Associations between intervention group and RMNCHN indicators were assessed at alpha = 0.05. Analyses were conducted in Stata version 14 [44]. Forest plots were produced via 'ggplot2' package in R 3.4.3 [45,46]. Due to the large number of comparisons, we applied the False Discovery Rate (FDR) controlling procedure by Benjamini and Hochberg [47] using SAS proc multtest, which reduces the false positive (type I error) rate by applying an upward adjustment to the P-values.
VIEWPOINTS RESEARCH THEME 6: LEARNING FROM

Ethical considerations
This study is part of the Ananya Bihar program and is registered at ClinicalTrials.gov number NCT02726230. The Stanford Institutional Review Board gave ethical approval for the analyses through protocol 39719.

Study population demographic characteristics
Characteristics of survey respondents were similar in focus and comparison districts at baseline and at midline ( Table 2). The average maternal respondent was approximately 26 years old, four-fifths (82%) were Hindu, about one-quarter belonged to Scheduled Caste/Scheduled Tribe, about 60% at baseline and 50% at midline had no formal education and about 40% at baseline and 45% at midline were literate. The median household size was about six, 52%-54% of the focal children of the maternal respondents were male, and about one-third of husbands had more than one year of formal education. Indicators at baseline Table 3 shows baseline and midline results for focus and comparison districts and Table S3 in the Online Supplementary Document shows corresponding sample sizes. Indicators in focus and comparison districts were similar at baseline ( Table 3), although some minor differences were present. At baseline, home births were higher in the comparison (40%) than the intervention districts (32%) whereas public facility births were lower in the comparison (45%) than the intervention districts (53%). Delayed newborn bathing was lower in the comparison (46%) than intervention districts (55%) but exclusive breastfeeding was higher in comparison districts (44% vs 39%) and wasting at 9-11 months of age was higher in comparison districts (40% vs 35%). Some family planning indicators were slightly higher in comparison than intervention districts. Overall, differences attributable to the program were driven by differences at midline, after two years of program implementation, and furthermore, DID analysis took baseline differences into account.
VIEWPOINTS RESEARCH THEME 6: LEARNING FROM ANANYA PROGRAM IN BIHAR When the baby is placed unclothed on mother's chest or abdomen with skin-to-skin contact under a blanket or some clothing § Cereal-based food (rice, khichdi, or bread).

||
Stunting-height-for-age z-score below 2 standard deviations (SDs) from the median height for age of the international reference population (which was drawn from six diverse countries). ¶ Wasting: weight-for-height z-score below 2 SDs of the reference population median. **Underweight: weight-for-age z-score below 2 SDs of the reference population median. † † Immunisation based on card and self-report, combined. ‡ ‡ Only for women who are not currently using any modern method of contraception.

Changes in indicators in comparison districts without Ananya interventions
Examination of changes in indicators in comparison districts from baseline to midline enabled insights into secular changes in indicators across the continuum of care in the absence of direct implementation of Ananya interventions ( Table 3). Significant health-promoting changes from baseline to midline in com- parison districts were observed for 13 indicators, including four or more ANC checkups [four percentage points (ppt) increase, P = 0.003), pregnancy registration (ten ppt increase, P < 0.001), consumption of 90 or more IFA tablets during pregnancy (three ppt increase, P = 0.033), home deliveries (ten ppt decrease, P < 0.001), public facility deliveries (ten ppt increase, P < 0.001), receipt of conditional cash transfer payment through Janani Avam Bal Suraksha Yojana (JBSY) for facility delivery (20 ppt increase, P < 0.001), immediate wiping and drying of the newborn (16 ppt increase, P < 0.001), skin-toskin contact (14 ppt increase, P = 0.005), delay of first bath by ≥2 days (eight ppt increase, P = 0.001), underweight (four ppt decrease, P = 0.016), immunisation card available (six ppt increase, P = 0.006), unvaccinated infants (four ppt decrease, P < 0.001), and Bacillus Calmette-Guérin (BCG) immunisation (four ppt increase, P = 0.004). In contrast, there were significant reductions in seven indicators of health promotion from baseline to midline, including identification of the place of delivery as part of birth preparedness (27 ppt decrease, P < 0.001), exclusive breastfeeding (six ppt decrease, P = 0.029), complementary feeding [ie, currently receiving any solid or semisolid food (five ppt decrease, P = 0.02), began receiving any solid or semisolid food by age 6 months (ten ppt decrease, P < 0.001)], measles immunisation (two ppt decrease, P = 0.01), fully vaccinated (three ppt decrease, P = 0.002), and use of a postpartum intrauterine device for contraception (two ppt decrease, P < 0.001). Thus, changes in indicators associated with contextual factors unrelated to Ananya were mixed, but overall favoured improvements in indicators in the direction of health promotion. These changes were also accounted for in DID analyses (ie, final column, Table 3).

Program effects by continuum of care domains
Some improvements in DID estimators were seen for ANC and birth preparedness, postnatal care, child immunisation and family planning (Figure 1), but only five indicators (10%) showed evidence of a statistically significant improvement attributable to the Ananya program beween 2012 and 2014 ( Table 3).
One of ten ANC and birth preparedness indicators showed a statistically significant increase attributable to the program after FDR adjustment of P values: a ten ppt increase was observed in two or more FLW home visits in the last trimester (P = 0.041). No significant change attributable to Ananya was seen for four delivery indicators, ten postnatal care indicators, five variables reflecting child nutrition, and eight child immunisation indicators. Four of 14 family planning indicators improved significantly as a result of Ananya interventions. There was a significant three ppt increase in postpartum tubal ligation (P = 0.026) and significant increases in use of any modern method of contraception among mothers of younger children 0-5 months (11 ppt, P = 0.034) and for mothers who had more than one child (ten ppt, P = 0.038), with the largest increase in use of oral contraceptive pills (five ppt) although this change was borderline in significance (P = 0.06). The increase in contraception was significant among mothers of female children (18 ppt increase, P < 0.001) but not among mothers of males. Table 4 shows changes in newborn health behaviors between intervention and comparison districts by place of delivery (home, public facility or private facility). Among public facility births there were significant eight ppt increases in the intended practices of applying nothing to the cut umbilical cord (P = 0.017) and immediate breastfeeding (within one hour of birth) (P = 0.032) attributable to the Ananya program. Among private facility births, there was a 27 ppt increase in applying nothing to the cut umbilical cord (P < .001). No significant changes in practices attributable to Ananya were seen for home births.

Program effects by reach
We further examined differences in indicators attributable to Ananya program interventions by restricting our analyses to maternal respondents who received either two or more antenatal FLW home visits in the last trimester ( Table 5) or one or more FLW home visits within seven days of delivery ( Table 6). In general, there was little evidence for improved practices among women reached through antenatal ( Table 5) or postnatal ( Table 6) FLW home visits compared to the entire sample of maternal respondents ( Table  3). Among women who received two or more antenatal FLW home visits, there was a significant 17 ppt (P = 0.038) improvement in exclusive breastfeeding but a 15 ppt (P = 0.04) decrease in identification of a skilled attendant for delivery among home births ( Table 5). Among women who received a FLW home visit within seven days of delivery, there was a significant 30 ppt (P = 0.006) increase in exclusive breastfeeding but an 11 ppt (P = 0.046) decrease in immediate wiping and drying of the newborn after delivery (ie, a decrease in intended practice) ( Table 6).

Program effects by intervention implementation platform
Program effects were similar across the delivery platforms ( Figure S1 in the Online Supplementary Document). Overall, one of five indicators of FLW performance (two or more FLW antenatal home visits), three family planning indicators among 30 indicators of mother's behaviour, and one of 16 indicators of facility/outreach service delivery (postpartum tubal ligation) showed significant improvements attributable to Ananya after FDR adjustment. Examination of indicators categorised by both continuum of care and delivery platform revealed that the most consistent, substantial gains attributable to Ananya were seen in mothers' family planning behaviours, especially in utilisation of modern contraception ( Figure S2 in the Online Supplementary Document).   Only one statistically significant improvement was seen for antenatal and birth preparedness (FLW home visits), and none for delivery care, postnatal care, child nutrition or immunisation. Three of the five indicators showing improvement reflected mother's behaviours in utilising modern contraception. Among sub-samples of infants by place of birth, improvements were found for cord care in public and private facility births but not for home births, and immediate breastfeeding improved in public but not private facilities. Overall, no improvements attributable to the Ananya program were seen in postnatal care practices among home births. There was limited evidence -only for exclusive breastfeeding -that antenatal or postnatal FLW home visits were associated with improvements in delivery care or newborn care, although this result must be viewed with caution given the possibility for selection bias in that characteristics of women who received these visits could have differed across the Ananya program and the comparison districts.
In other analyses, we found that social and behavioural change communication interventions led by BBC Media Action showed more robust and consistent improvements in desired behaviours among mothers who were exposed to mHealth interventions [38]; however, staged implementation and low exposure levels to these interventions at this stage of program implementation and scale-up likely contributed to the lack of population-level impact seen here. Moreover, while we found strong evidence for health impact associated with SHG membership, SHG interventions had not yet been scaled up by the time of the Mathematica midline evaluation [48,49].
In Bihar (and across India), there were substantial secular declines, based on population-level survey data (eg, Annual Health Survey, Sample Registration System), in maternal mortality ratio and smaller but steady reductions in infant mortality rate, neonatal morality rate, under-5 mortality rate and total fertility rate (Table S4 in the Online Supplementary Document). Similar to the changes in indicators in comparison districts from baseline to midline in our study, changes in health indicators in large-scale survey data were mixed but mostly showed improvements over a similar time frame. According to Annual Health Survey (AHS) data, institutional births were steadily increasing and there were small increases in measures of ANC (3+ ANC visits and consumption of IFA), immediate and exclusive breastfeeding, child immunisation (eg, children 12-23 months fully immunised) and use of modern contraception (Table S5 in the Online Supplementary Document). Other AHS indicators, however, such as early postnatal visits and complementary feeding practices (eg, children receiving solid or semi-solid food and breastmilk) showed little secular change during the study period. Statewide secular increases in key health measures may be attributable to multiple statewide government programs, campaigns, and political commitments to improve economic conditions and health services that were operational during this time period. For example, funding levels were increasing from the central government to the National Rural Health Mission with a focus on reaching marginalised communities with priority health interventions and increasing numbers of health workers; and the Janani Avam Bal Suraksha Yojana (JBSY) program provided a cash transfer to women to incentivise facility deliveries [50]. These data illustrate the importance of taking into account changes in indicators due to influences outside the program when attempting to assess impact attributable to program implementation. Without doing so in the Ananya program area could result overall in an over-estimation of program impact. Thus, given the varying magnitude and pace of improvement of health measures in Bihar during this period, the DID design of this evaluation was critical for assessing the attributable effect of the Ananya program on RMNCHN behaviours.
Interpretation of evaluation results must consider the large scale and ambitious scope of Ananya coupled with the short, two-year midline in the context of a planned five year initative. The Ananya interventions sought to saturate the health system at community and, to a lesser extent, at facility levels and through outreach by rolling out a number of interventions through various delivery platforms [13]. Timing of intervention introduction and ramp-up of implementation varied, however, and thus, many interventions were operational for even shorter periods of time. More time for continuation of program implementation under the Ananya management structure with intensive support to GoB implementation may have yielded more consistent and higher magnitude effects. Further, the midline evaluation was collected during the period when GoB implementation was transitioning to scale-up and thus intensity of implementation and Ananya program support to the GoB may have already been declining [13]. Additionally, improvements across multiple health indicators may have been limited by supply-side constraints. Last-mile supply chain and logistics management were challenging as many commodities were centrally procured, and supply chain management improvements for commodities like IFA tablets and modern contraception methods such as condoms and pills were not addressed in the first two years of the program. Similarly, improvements in complementary feeding behaviours through FLW counseling/advice may have been constrained by lack of household access to certain foods (eg, vegetables and meats). Ananya did not focus on complementary efforts such as providing nutritional supplements or promoting increases in household purchasing power that -when coupled with FLW counseling on complementary feeding and nutrition -might have generated more positive impact in this domain.
The evaluation design was not capable of identifying specific demand-side barriers to adoption. For example, babies being breastfed within one hour of birth for home deliveries decreased by four ppt, but we do not know if the constraint to adoption pertained to knowledge, attitude, skills, efficacy, or social norms. We conducted a realist evaluation in two districts of Bihar outside the Ananya pilot study area to characterise motivational mechanisms among ASHAs in Bihar [51]. Findings suggest that further efforts to nurture and sustain FLWs' intrinsic motivation may be necessary for improving their performance in engaging beneficiaries in behaviour change to improve health. While the program sought to use data to help improve program approaches and GoB implementation and health impact, additional qualitative research, including assessment of implementation processes, and further emphasis on mixed methods research to gain insights into implementation successes and failures may have been instructive.
Other than two randomised controlled trials implemented during this period [19,20], it is not possible to attribute results to specific interventions or platforms. The Mathematica surveys were particularly limited in insight into the impact of facility-based interventions during this time period, with the exception of a small number of neonatal health behaviours for babies delivered in facilities ( Table 3, Table 4). Facility-based interventions from 2012-2014 focused on filling gaps at facilities identified through self-assessment processes led by quality improvement teams [13,39,40], but quality improvement interventions were intensified and scaled up only after the study period covered by the Mathematica evaluation.
Collaborative efforts such as the PHC Performance Initiative [52] and The Lancet Global Health Commission on High-Quality Health Systems in the SDG Era [5] seek to catalyse improvements in the performance of health systems through identifying research gaps, informing the design of better measurement systems, and identifying and disseminating effective practices. These efforts underscore the need to address research gaps in health systems research, including evaluation of large-scale efforts to improve the quality of health services and measure the effects of quality-focused intervention designs on user experience, equity, and their impact on the performance of different components of the health system [5]. Similarly, research is needed to understand how performance measurement and management systems work in PHC systems [53], and the impact of socio-political dynamics on the adoption of health innovations and health-seeking behaviour [54]. Cost-effectiveness measures of large-scale RMNCHN programs are also needed.
To complement more traditional public health evaluations, program designers could utilise evaluation designs such as realist evaluations, which seek to discern "what works, for whom, in what respects, to what extent, and how" [55]. Given the challenge of evaluating programs such as Ananya which are comprised of multiple interventions and require great coordination among stakeholders, additional